A family of fast exact pattern matching algorithms

نویسنده

  • Igor O. Zavadskyi
چکیده

A family of comparison-based exact pattern matching algorithms is described. They utilize multi-dimensional arrays in order to process more than one adjacent text window in each iteration of the search cycle. This approach leads to a lower average time complexity by the cost of space. The algorithms of this family perform well for short patterns and middle size alphabets. In such case the shift of the window by several pattern lengths at once is quite probable, which is the main factor of algorithm success. Our algorithms outperform the Boyer-Moore-Horspool algorithm, either in the original version or with Sunday’s “Quick search” modification, in a wide area of pattern length / alphabet size plane. In some subareas the proposed algorithms are the fastest among all known exact pattern matching algorithms. Namely, they perform best when alphabet size is about 30–40 and pattern length is about 4–10. Such parameters are typical for search in natural language text databases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast exact string matching algorithms

String matching is the problem of finding all the occurrences of a pattern in a text. We propose a very fast new family of string matching algorithms based on hashing q-grams. The new algorithms are the fastest on many cases, in particular, on small size alphabets. © 2007 Elsevier B.V. All rights reserved.

متن کامل

Exact and Approximate Two Dimensional Pattern Matching allowing Rotations

We give fast ltering algorithms for searching a 2{dimensional pattern in a 2{dimensional text allowing any rotation of the pattern. We consider the cases of exact and approximate matching under several matching models, improving the previous results. For a text of size n n character and a pattern of size m m characters, the exact matching takes average time O(n 2 =m). If we allow k{mismatches o...

متن کامل

Optimal Exact and Fast Approximate Two Dimensional Pattern Matching Allowing Rotations

We give fast filtering algorithms to search for a 2– dimensional pattern in a 2–dimensional text allowing any rotation of the pattern. We consider the cases of exact and approximate matching under several matching models, improving the previous results. For a text of size n× n characters and a pattern of size m×m characters, the exact matching takes average time O(n log m/m), which is optimal. ...

متن کامل

Occurrence and Substring Heuristics for i-Matching

We consider a version of pattern matching useful in processing large musical data: matching, which consists in finding matches which are -approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols , is measured as . We also consider -matching, where is a bound on the total sum of the diff...

متن کامل

Occurrence and Substring Heuristics for -Matching

We consider a version of pattern matching useful in processing large musical data: matching, which consists in finding matches which are -approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols , is measured as . We also consider -matching, where is a bound on the total sum of the diff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1608.08346  شماره 

صفحات  -

تاریخ انتشار 2016